Your browser doesn't support javascript.
loading
Mostrar: 20 | 50 | 100
Resultados 1 - 15 de 15
Filtrar
1.
medRxiv ; 2024 Feb 28.
Artigo em Inglês | MEDLINE | ID: mdl-38464285

RESUMO

Background: Studies have identified individual blood biomarkers associated with chronic obstructive pulmonary disease (COPD) and related phenotypes. However, complex diseases such as COPD typically involve changes in multiple molecules with interconnections that may not be captured when considering single molecular features. Methods: Leveraging proteomic data from 3,173 COPDGene Non-Hispanic White (NHW) and African American (AA) participants, we applied sparse multiple canonical correlation network analysis (SmCCNet) to 4,776 proteins assayed on the SomaScan v4.0 platform to derive sparse networks of proteins associated with current vs. former smoking status, airflow obstruction, and emphysema quantitated from high-resolution computed tomography scans. We then used NetSHy, a dimension reduction technique leveraging network topology, to produce summary scores of each proteomic network, referred to as NetSHy scores. We next performed genome-wide association study (GWAS) to identify variants associated with the NetSHy scores, or network quantitative trait loci (nQTLs). Finally, we evaluated the replicability of the networks in an independent cohort, SPIROMICS. Results: We identified networks of 13 to 104 proteins for each phenotype and exposure in NHW and AA, and the derived NetSHy scores significantly associated with the variable of interests. Networks included known (sRAGE, ALPP, MIP1) and novel molecules (CA10, CPB1, HIS3, PXDN) and interactions involved in COPD pathogenesis. We observed 7 nQTL loci associated with NetSHy scores, 4 of which remained after conditional analysis. Networks for smoking status and emphysema, but not airflow obstruction, demonstrated a high degree of replicability across race groups and cohorts. Conclusions: In this work, we apply state-of-the-art molecular network generation and summarization approaches to proteomic data from COPDGene participants to uncover protein networks associated with COPD phenotypes. We further identify genetic associations with networks. This work discovers protein networks containing known and novel proteins and protein interactions associated with clinically relevant COPD phenotypes across race groups and cohorts.

2.
bioRxiv ; 2024 Jan 25.
Artigo em Inglês | MEDLINE | ID: mdl-38328226

RESUMO

Multiple -omics (genomics, proteomics, etc.) profiles are commonly generated to gain insight into a disease or physiological system. Constructing multi-omics networks with respect to the trait(s) of interest provides an opportunity to understand relationships between molecular features but integration is challenging due to multiple data sets with high dimensionality. One approach is to use canonical correlation to integrate one or two omics types and a single trait of interest. However, these types of methods may be limited due to (1) not accounting for higher-order correlations existing among features, (2) computational inefficiency when extending to more than two omics data when using a penalty term-based sparsity method, and (3) lack of flexibility for focusing on specific correlations (e.g., omics-to-phenotype correlation versus omics-to-omics correlations). In this work, we have developed a novel multi-omics network analysis pipeline called Sparse Generalized Tensor Canonical Correlation Analysis Network Inference (SGTCCA-Net) that can effectively overcome these limitations. We also introduce an implementation to improve the summarization of networks for downstream analyses. Simulation and real-data experiments demonstrate the effectiveness of our novel method for inferring omics networks and features of interest.

3.
Sci Rep ; 13(1): 9254, 2023 06 07.
Artigo em Inglês | MEDLINE | ID: mdl-37286633

RESUMO

Privacy protection is a core principle of genomic but not proteomic research. We identified independent single nucleotide polymorphism (SNP) quantitative trait loci (pQTL) from COPDGene and Jackson Heart Study (JHS), calculated continuous protein level genotype probabilities, and then applied a naïve Bayesian approach to link SomaScan 1.3K proteomes to genomes for 2812 independent subjects from COPDGene, JHS, SubPopulations and InteRmediate Outcome Measures In COPD Study (SPIROMICS) and Multi-Ethnic Study of Atherosclerosis (MESA). We correctly linked 90-95% of proteomes to their correct genome and for 95-99% we identify the 1% most likely links. The linking accuracy in subjects with African ancestry was lower (~ 60%) unless training included diverse subjects. With larger profiling (SomaScan 5K) in the Atherosclerosis Risk Communities (ARIC) correct identification was > 99% even in mixed ancestry populations. We also linked proteomes-to-proteomes and used the proteome only to determine features such as sex, ancestry, and first-degree relatives. When serial proteomes are available, the linking algorithm can be used to identify and correct mislabeled samples. This work also demonstrates the importance of including diverse populations in omics research and that large proteomic datasets (> 1000 proteins) can be accurately linked to a specific genome through pQTL knowledge and should not be considered unidentifiable.


Assuntos
Aterosclerose , Proteoma , Humanos , Proteoma/genética , Teorema de Bayes , Privacidade , Estudo de Associação Genômica Ampla , Aterosclerose/genética , Polimorfismo de Nucleotídeo Único
4.
PLoS One ; 18(4): e0284563, 2023.
Artigo em Inglês | MEDLINE | ID: mdl-37083575

RESUMO

Network approaches have successfully been used to help reveal complex mechanisms of diseases including Chronic Obstructive Pulmonary Disease (COPD). However despite recent advances, we remain limited in our ability to incorporate protein-protein interaction (PPI) network information with omics data for disease prediction. New deep learning methods including convolution Graph Neural Network (ConvGNN) has shown great potential for disease classification using transcriptomics data and known PPI networks from existing databases. In this study, we first reconstructed the COPD-associated PPI network through the AhGlasso (Augmented High-Dimensional Graphical Lasso Method) algorithm based on one independent transcriptomics dataset including COPD cases and controls. Then we extended the existing ConvGNN methods to successfully integrate COPD-associated PPI, proteomics, and transcriptomics data and developed a prediction model for COPD classification. This approach improves accuracy over several conventional classification methods and neural networks that do not incorporate network information. We also demonstrated that the updated COPD-associated network developed using AhGlasso further improves prediction accuracy. Although deep neural networks often achieve superior statistical power in classification compared to other methods, it can be very difficult to explain how the model, especially graph neural network(s), makes decisions on the given features and identifies the features that contribute the most to prediction generally and individually. To better explain how the spectral-based Graph Neural Network model(s) works, we applied one unified explainable machine learning method, SHapley Additive exPlanations (SHAP), and identified CXCL11, IL-2, CD48, KIR3DL2, TLR2, BMP10 and several other relevant COPD genes in subnetworks of the ConvGNN model for COPD prediction. Finally, Gene Ontology (GO) enrichment analysis identified glycosaminoglycan, heparin signaling, and carbohydrate derivative signaling pathways significantly enriched in the top important gene/proteins for COPD classifications.


Assuntos
Aprendizado Profundo , Doença Pulmonar Obstrutiva Crônica , Humanos , Multiômica , Redes Neurais de Computação , Algoritmos , Doença Pulmonar Obstrutiva Crônica/genética , Proteínas Morfogenéticas Ósseas
5.
J Am Heart Assoc ; 12(9): e028483, 2023 05 02.
Artigo em Inglês | MEDLINE | ID: mdl-37119087

RESUMO

Background Rhythm management is a complex decision for patients with atrial fibrillation (AF). Although clinical trials have identified subsets of patients who might benefit from a given rhythm-management strategy, for individual patients it is not always clear which strategy is expected to have the greatest mortality benefit or durability. Methods and Results In this investigation 52 547 patients with a new atrial fibrillation diagnosis between 2010 and 2020 were retrospectively identified. We applied a type of artificial intelligence called tabular Q-learning to identify the optimal initial rhythm-management strategy, based on a composite outcome of mortality, change in treatment, and sustainability of the given treatment, termed the reward function. We first applied an unsupervised learning algorithm using a variational autoencoder with K-means clustering to cluster atrial fibrillation patients into 8 distinct phenotypes. We then fit a Q-learning algorithm to predict the best outcome for each cluster. Although rate-control strategy was most frequently selected by treating providers, the outcome was superior for rhythm-control strategies across all clusters. Subjects in whom provider-selected treatment matched the Q-table recommendation had fewer total deaths (4 [8.5%] versus 473 [22.4%], odds ratio=0.32, P=0.02) and a greater reward (P=4.8×10-6). We then demonstrated application of dynamic learning by updating the Q-table prospectively using batch gradient descent, in which the optimal strategy in some clusters changed from cardioversion to ablation. Conclusions Tabular Q-learning provides a dynamic and interpretable approach to apply artificial intelligence to clinical decision-making for atrial fibrillation. Further work is needed to examine application of Q-learning prospectively in clinical patients.


Assuntos
Fibrilação Atrial , Humanos , Fibrilação Atrial/terapia , Fibrilação Atrial/tratamento farmacológico , Antiarrítmicos/uso terapêutico , Estudos Retrospectivos , Inteligência Artificial , Cardioversão Elétrica
6.
Bioinformatics ; 39(1)2023 01 01.
Artigo em Inglês | MEDLINE | ID: mdl-36548341

RESUMO

MOTIVATION: Biological networks can provide a system-level understanding of underlying processes. In many contexts, networks have a high degree of modularity, i.e. they consist of subsets of nodes, often known as subnetworks or modules, which are highly interconnected and may perform separate functions. In order to perform subsequent analyses to investigate the association between the identified module and a variable of interest, a module summarization, that best explains the module's information and reduces dimensionality is often needed. Conventional approaches for obtaining network representation typically rely only on the profiles of the nodes within the network while disregarding the inherent network topological information. RESULTS: In this article, we propose NetSHy, a hybrid approach which is capable of reducing the dimension of a network while incorporating topological properties to aid the interpretation of the downstream analyses. In particular, NetSHy applies principal component analysis (PCA) on a combination of the node profiles and the well-known Laplacian matrix derived directly from the network similarity matrix to extract a summarization at a subject level. Simulation scenarios based on random and empirical networks at varying network sizes and sparsity levels show that NetSHy outperforms the conventional PCA approach applied directly on node profiles, in terms of recovering the true correlation with a phenotype of interest and maintaining a higher amount of explained variation in the data when networks are relatively sparse. The robustness of NetSHy is also demonstrated by a more consistent correlation with the observed phenotype as the sample size decreases. Lastly, a genome-wide association study is performed as an application of a downstream analysis, where NetSHy summarization scores on the biological networks identify more significant single nucleotide polymorphisms than the conventional network representation. AVAILABILITY AND IMPLEMENTATION: R code implementation of NetSHy is available at https://github.com/thaovu1/NetSHy. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.


Assuntos
Estudo de Associação Genômica Ampla , Polimorfismo de Nucleotídeo Único , Simulação por Computador , Análise de Componente Principal , Tamanho da Amostra
7.
Sensors (Basel) ; 22(20)2022 Oct 21.
Artigo em Inglês | MEDLINE | ID: mdl-36298427

RESUMO

Human gait analysis presents an opportunity to study complex spatiotemporal data transpiring as co-movement patterns of multiple moving objects (i.e., human joints). Such patterns are acknowledged as movement signatures specific to an individual, offering the possibility to identify each individual based on unique gait patterns. We present a spatiotemporal deep learning model, dubbed ST-DeepGait, to featurize spatiotemporal co-movement patterns of human joints, and accordingly classify such patterns to enable human gait recognition. To this end, the ST-DeepGait model architecture is designed according to the spatiotemporal human skeletal graph in order to impose learning the salient local spatial dynamics of gait as they occur over time. Moreover, we employ a multi-layer RNN architecture to induce a sequential notion of gait cycles in the model. Our experimental results show that ST-DeepGait can achieve recognition accuracy rates over 90%. Furthermore, we qualitatively evaluate the model with the class embeddings to show interpretable separability of the features in geometric latent space. Finally, to evaluate the generalizability of our proposed model, we perform a zero-shot detection on 10 classes of data completely unseen during training and achieve a recognition accuracy rate of 88% overall. With this paper, we also contribute our gait dataset captured with an RGB-D sensor containing approximately 30 video samples of each subject for 100 subjects totaling 3087 samples. While we use human gait analysis as a motivating application to evaluate ST-DeepGait, we believe that this model can be simply adopted and adapted to study co-movement patterns of multiple moving objects in other applications such as in sports analytics and traffic pattern analysis.


Assuntos
Aprendizado Profundo , Humanos , Marcha , Análise da Marcha
8.
JMIR Form Res ; 6(8): e36443, 2022 Aug 11.
Artigo em Inglês | MEDLINE | ID: mdl-35969422

RESUMO

BACKGROUND: Despite the numerous studies evaluating various rhythm control strategies for atrial fibrillation (AF), determination of the optimal strategy in a single patient is often based on trial and error, with no one-size-fits-all approach based on international guidelines/recommendations. The decision, therefore, remains personal and lends itself well to help from a clinical decision support system, specifically one guided by artificial intelligence (AI). QRhythm utilizes a 2-stage machine learning (ML) model to identify the optimal rhythm management strategy in a given patient based on a set of clinical factors, in which the model first uses supervised learning to predict the actions of an expert clinician and identifies the best strategy through reinforcement learning to obtain the best clinical outcome-a composite of symptomatic recurrence, hospitalization, and stroke. OBJECTIVE: We qualitatively evaluated a novel, AI-based, clinical decision support system (CDSS) for AF rhythm management, called QRhythm, which uses both supervised and reinforcement learning to recommend either a rate control or one of 3 types of rhythm control strategies-external cardioversion, antiarrhythmic medication, or ablation-based on individual patient characteristics. METHODS: Thirty-three clinicians, including cardiology attendings and fellows and internal medicine attendings and residents, performed an assessment of QRhythm, followed by a survey to assess relative comfort with automated CDSS in rhythm management and to examine areas for future development. RESULTS: The 33 providers were surveyed with training levels ranging from resident to fellow to attending. Of the characteristics of the app surveyed, safety was most important to providers, with an average importance rating of 4.7 out of 5 (SD 0.72). This priority was followed by clinical integrity (a desire for the advice provided to make clinical sense; importance rating 4.5, SD 0.9), backward interpretability (transparency in the population used to create the algorithm; importance rating 4.3, SD 0.65), transparency of the algorithm (reasoning underlying the decisions made; importance rating 4.3, SD 0.88), and provider autonomy (the ability to challenge the decisions made by the model; importance rating 3.85, SD 0.83). Providers who used the app ranked the integrity of recommendations as their highest concern with ongoing clinical use of the model, followed by efficacy of the application and patient data security. Trust in the app varied; 1 (17%) provider responded that they somewhat disagreed with the statement, "I trust the recommendations provided by the QRhythm app," 2 (33%) providers responded with neutrality to the statement, and 3 (50%) somewhat agreed with the statement. CONCLUSIONS: Safety of ML applications was the highest priority of the providers surveyed, and trust of such models remains varied. Widespread clinical acceptance of ML in health care is dependent on how much providers trust the algorithms. Building this trust involves ensuring transparency and interpretability of the model.

9.
Front Big Data ; 5: 894632, 2022.
Artigo em Inglês | MEDLINE | ID: mdl-35811829

RESUMO

Chronic obstructive pulmonary disease (COPD) is one of the leading causes of death in the United States. COPD represents one of many areas of research where identifying complex pathways and networks of interacting biomarkers is an important avenue toward studying disease progression and potentially discovering cures. Recently, sparse multiple canonical correlation network analysis (SmCCNet) was developed to identify complex relationships between omics associated with a disease phenotype, such as lung function. SmCCNet uses two sets of omics datasets and an associated output phenotypes to generate a multi-omics graph, which can then be used to explore relationships between omics in the context of a disease. Detecting significant subgraphs within this multi-omics network, i.e., subgraphs which exhibit high correlation to a disease phenotype and high inter-connectivity, can help clinicians identify complex biological relationships involved in disease progression. The current approach to identifying significant subgraphs relies on hierarchical clustering, which can be used to inform clinicians about important pathways involved in the disease or phenotype of interest. The reliance on a hierarchical clustering approach can hinder subgraph quality by biasing toward finding more compact subgraphs and removing larger significant subgraphs. This study aims to introduce new significant subgraph detection techniques. In particular, we introduce two subgraph detection methods, dubbed Correlated PageRank and Correlated Louvain, by extending the Personalized PageRank Clustering and Louvain algorithms, as well as a hybrid approach combining the two proposed methods, and compare them to the hierarchical method currently in use. The proposed methods show significant improvement in the quality of the subgraphs produced when compared to the current state of the art.

10.
JMIR Form Res ; 6(4): e34827, 2022 Apr 12.
Artigo em Inglês | MEDLINE | ID: mdl-35412460

RESUMO

BACKGROUND: Management of chronic recurrent medical conditions (CRMCs), such as migraine headaches, chronic pain, and anxiety/depression, remains a major challenge for modern providers. Our team has developed an edge-based, semiautomated mobile health (mHealth) technology called iMTracker that employs the N-of-1 trial approach to allow self-management of CRMCs. OBJECTIVE: This study examines the patterns of adoption, identifies CRMCs that users selected for self-application, and explores barriers to use of the iMTracker app. METHODS: This is a feasibility pilot study with internet-based recruitment that ran from May 15, 2019, to December 23, 2020. We recruited 180 patients to pilot test the iMTracker app for user-selected CRMCs for a 3-month period. Patients were administered surveys before and after the study. RESULTS: We found reasonable usage rates: a total of 73/103 (70.9%) patients who were not lost to follow-up reported the full 3-month use of the app. Most users chose to use the iMTracker app to self-manage chronic pain (other than headaches; 80/212, 37.7%), followed by headaches in 36/212 (17.0%) and mental health (anxiety and depression) in 27/212 (12.8%). The recurrence rate of CRMCs was at least weekly in over 93% (169/180) of patients, with 36.1% (65/180) of CRMCs recurring multiple times in a day, 41.7% (75/180) daily, and 16.1% (29/180) weekly. We found that the main barriers to use were the design and technical function of the app, but that use of the app resulted in an improvement in confidence in the efficiency and safety/privacy of this approach. CONCLUSIONS: The iMTracker app provides a feasible platform for the N-of-1 trial approach to self-management of CRMCs, although internet-based recruitment provided limited follow-up, suggesting that in-person evaluation may be needed. The rate of CRMC recurrence was high enough to allow the N-of-1 trial assessment for most traits.

11.
Artigo em Inglês | MEDLINE | ID: mdl-36776768

RESUMO

The study of complex behavior of biological systems has become increasingly dependent on evolutionary network modeling. In particular, multi-omics networks capture interactions between biomolecules such as proteins and metabolites, providing a basis for predicting relationships between such biomolecules and various phenotypic traits of complex diseases. In this paper, we introduce an integrative framework that given a multi-omics network representing a cohort of subjects, learns expressive representations for network nodes, and combines the learned nodes representations with the biological profiles of individual subjects for enriched representation of the subjects. With extensive empirical evaluation using real-world multi-omics networks, we show that our proposed framework significantly outperforms existing and baseline methods in terms of subject representation accuracy, particularly when the multi-omics network representing the cohort is sparse and structured and therefore, more informative.

12.
PLoS One ; 16(8): e0255337, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-34432807

RESUMO

Chronic Obstructive Pulmonary Disease (COPD) is the third leading cause of mortality in the United States; however, COPD has heterogeneous clinical phenotypes. This is the first large scale attempt which uses transcriptomics, proteomics, and metabolomics (multi-omics) to determine whether there are molecularly defined clusters with distinct clinical phenotypes that may underlie the clinical heterogeneity. Subjects included 3,278 subjects from the COPDGene cohort with at least one of the following profiles: whole blood transcriptomes (2,650 subjects); plasma proteomes (1,013 subjects); and plasma metabolomes (1,136 subjects). 489 subjects had all three contemporaneous -omics profiles. Autoencoder embeddings were performed individually for each -omics dataset. Embeddings underwent subspace clustering using MineClus, either individually by -omics or combined, followed by recursive feature selection based on Support Vector Machines. Clusters were tested for associations with clinical variables. Optimal single -omics clustering typically resulted in two clusters. Although there was overlap for individual -omics cluster membership, each -omics cluster tended to be defined by unique molecular pathways. For example, prominent molecular features of the metabolome-based clustering included sphingomyelin, while key molecular features of the transcriptome-based clusters were related to immune and bacterial responses. We also found that when we integrated the -omics data at a later stage, we identified subtypes that varied based on age, severity of disease, in addition to diffusing capacity of the lungs for carbon monoxide, and precent on atrial fibrillation. In contrast, when we integrated the -omics data at an earlier stage by treating all data sets equally, there were no clinical differences between subtypes. Similar to clinical clustering, which has revealed multiple heterogenous clinical phenotypes, we show that transcriptomics, proteomics, and metabolomics tend to define clusters of COPD patients with different clinical characteristics. Thus, integrating these different -omics data sets affords additional insight into the molecular nature of COPD and its heterogeneity.


Assuntos
Perfilação da Expressão Gênica/métodos , Metabolômica/métodos , Proteômica/métodos , Doença Pulmonar Obstrutiva Crônica/classificação , Fatores Etários , Idoso , Análise por Conglomerados , Bases de Dados Factuais , Feminino , Regulação da Expressão Gênica , Redes Reguladoras de Genes , Humanos , Masculino , Pessoa de Meia-Idade , Fenótipo , Doença Pulmonar Obstrutiva Crônica/sangue , Doença Pulmonar Obstrutiva Crônica/genética , Máquina de Vetores de Suporte
13.
Nat Commun ; 12(1): 1652, 2021 03 12.
Artigo em Inglês | MEDLINE | ID: mdl-33712618

RESUMO

Annotation of polyadenylation sites from short-read RNA sequencing alone is a challenging computational task. Other algorithms rooted in DNA sequence predict potential polyadenylation sites; however, in vivo expression of a particular site varies based on a myriad of conditions. Here, we introduce aptardi (alternative polyadenylation transcriptome analysis from RNA-Seq data and DNA sequence information), which leverages both DNA sequence and RNA sequencing in a machine learning paradigm to predict expressed polyadenylation sites. Specifically, as input aptardi takes DNA nucleotide sequence, genome-aligned RNA-Seq data, and an initial transcriptome. The program evaluates these initial transcripts to identify expressed polyadenylation sites in the biological sample and refines transcript 3'-ends accordingly. The average precision of the aptardi model is twice that of a standard transcriptome assembler. In particular, the recall of the aptardi model (the proportion of true polyadenylation sites detected by the algorithm) is improved by over three-fold. Also, the model-trained using the Human Brain Reference RNA commercial standard-performs well when applied to RNA-sequencing samples from different tissues and different mammalian species. Finally, aptardi's input is simple to compile and its output is easily amenable to downstream analyses such as quantitation and differential expression.


Assuntos
Sequenciamento de Nucleotídeos em Larga Escala/métodos , Poliadenilação , Transcriptoma , Animais , Sequência de Bases , Perfilação da Expressão Gênica , Humanos , RNA/química , RNA/metabolismo , Análise de Sequência de RNA , Biologia de Sistemas
14.
Front Genet ; 12: 760299, 2021.
Artigo em Inglês | MEDLINE | ID: mdl-35154240

RESUMO

Biological networks are often inferred through Gaussian graphical models (GGMs) using gene or protein expression data only. GGMs identify conditional dependence by estimating a precision matrix between genes or proteins. However, conventional GGM approaches often ignore prior knowledge about protein-protein interactions (PPI). Recently, several groups have extended GGM to weighted graphical Lasso (wGlasso) and network-based gene set analysis (Netgsa) and have demonstrated the advantages of incorporating PPI information. However, these methods are either computationally intractable for large-scale data, or disregard weights in the PPI networks. To address these shortcomings, we extended the Netgsa approach and developed an augmented high-dimensional graphical Lasso (AhGlasso) method to incorporate edge weights in known PPI with omics data for global network learning. This new method outperforms weighted graphical Lasso-based algorithms with respect to computational time in simulated large-scale data settings while achieving better or comparable prediction accuracy of node connections. The total runtime of AhGlasso is approximately five times faster than weighted Glasso methods when the graph size ranges from 1,000 to 3,000 with a fixed sample size (n = 300). The runtime difference between AhGlasso and weighted Glasso increases when the graph size increases. Using proteomic data from a study on chronic obstructive pulmonary disease, we demonstrate that AhGlasso improves protein network inference compared to the Netgsa approach by incorporating PPI information.

15.
PLoS One ; 13(10): e0206153, 2018.
Artigo em Inglês | MEDLINE | ID: mdl-30372463

RESUMO

Low levels of physical activity are associated with increased mortality risk, especially in cardiac patients, but most studies are based on self-report. Cardiac implantable electronic devices (CIEDs) offer an opportunity to collect data for longer periods of time. However, there is limited agreement on the best approaches for quantification of activity measures due to the time series nature of the data. We examined physical activity time series data from 235 subjects with CIEDs and at least 365 days of uninterrupted measures. Summary statistics for raw daily physical activity (minutes/day), including statistical moments (e.g., mean, standard deviation, skewness, kurtosis), time series regression coefficients, frequency domain components, and forecasted predicted values, were calculated for each individual, and used to predict occurrence of ventricular tachycardia (VT) events as recorded by the device. In unsupervised analyses using principal component analysis, we found that while certain features tended to cluster near each other, most provided a reasonable spread across activity space without a large degree of redundancy. In supervised analyses, we found several features that were associated with the outcome (P < 0.05) in univariable and multivariable approaches, but few were consistent across models. Using a machine-learning approach in which the data was split into training and testing sets, and models ranging in complexity from simple univariable logistic regression to ensemble decision trees were fit, there was no improvement in classification of risk over naïve methods for any approach. Although standard approaches identified summary features of physical activity data that were correlated with risk of VT, machine-learning approaches found that none of these features provided an improvement in classification. Future studies are needed to explore and validate methods for feature extraction and machine learning in classification of VT risk based on device-measured activity.


Assuntos
Desfibriladores Implantáveis , Exercício Físico/fisiologia , Taquicardia Ventricular/classificação , Árvores de Decisões , Feminino , Humanos , Modelos Logísticos , Aprendizado de Máquina , Masculino , Projetos Piloto , Taquicardia Ventricular/fisiopatologia
SELEÇÃO DE REFERÊNCIAS
DETALHE DA PESQUISA
...